The gapminder data are related to a famous TED talk given by Hans Rosling. In his talk, Dr. Rosling shows an animated visualization depicting the relationship between life expectancy and average income levels by country. Our goal in this module is to reproduce Dr. Rosling’s visualization.
We will access the gapminder data from the gapminder R package. This package contains a dataset (technically, a tibble) called gapminder with 6 variables:
| variable | meaning |
|---|---|
| country | country |
| continent | continent |
| year | year |
| lifeExp | life expectancy at birth |
| pop | total population |
| gdpPercap | per-capita GDP |
Per-capita GDP (Gross domestic product) is given in units of international dollars, “a hypothetical unit of currency that has the same purchasing power parity that the U.S. dollar had in the United States at a given point in time” – 2005, in this case.
Note: the gapminder R package exists for the purpose of teaching and making code examples. It is an excerpt of data found in specific spreadsheets on Gapminder.org circa 2010. It is not a definitive source of socioeconomic data.
Before starting these exercises, you should have a good understanding of
The Data Visualization Basics Primer.
Chapters 1-3 of R for Data Science
Load the tidyverse and gapminder packages. We are using tidyverse to access the ggplot2 package and using gapminder to access the data.
## Warning: package 'gapminder' was built under R version 3.6.3
library(tidyverse)
# tell knitr to use the project working directory
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())Note that I am using chunk options message = FALSE, echo = TRUE because loading R packages will often produce printed output that will show up in your knitted Rmarkdown document. Saying message = FALSE suppresses printed messages, while saying echo = TRUE ensures that the code in your chunk will be printed. This is how I would like you to organize loading packages in your homework .Rmd files.
In gapminder, each country has 12 rows distinguished by year.
Create a scatter plot using gdpPercap as the x-variable and lifeExp as the y-variable:
Modify your figure from exercise 1: transform the scale of your x-axis to be in log base 10 units. (See ?scale_x_log10)
Add x- and y-axis labels to your figure from exercise 2.
Add a smoothed curve to your plot, showing the overall population trend. (See ?geom_smooth)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Adjust the points in your graph:
shape to be 21color to be 'black'fill to be 'grey'Adjust the overall population trend as well:
'red'## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Go to the ggplot2 theme() reference page and scroll through the pictures that show some of the built-in ggplot2 themes. Pick a theme that you like and add it to the figure you created in exercise 5.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
There is something happening in the upper levels of income. The population trend between income and life expectancy changes direction. There is an R package called plotly that can help you explore ggplot figures interactively. Converting a ggplot2 figure into a plotly figure is straightforward:
library(plotly, warn.conflicts = FALSE)
# step 1: create a ggplot2 figure
# step 2: if you want to see data for individual points in your
# graph, you can set the label values using aes(), like this:
gg_figure <- read_rds('solutions/06_solution.rds') +
aes(label = country)
# step 3: use ggplotly on the figure
ggplotly(gg_figure)## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
You can tell by hovering your mouse over the far right points in the figure that the higher income but lower life expectancy country is Kuwait. Now, re-create this figure, but use year as a label instead of country, and identify the years that account for these points.
Once you’ve seen the year values associated with the points in the upper-income but lower than expected life expectancy, formulate a hypothesis explaining your data. After you’ve written your hypothesis down, go to Wikipedia’s Kuwait page and read about their modern history. Was your hypothesis correct?
University of Alabama at Birmingham, bcjaeger@uab.edu↩